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Abstract 

Consider updates arriving online in which the ith input is (it,dt), where it's are thought of as IDs 
of users. Informally, a randomized function / is differentially private with respect to the IDs if the 
probability distribution induced by / is not much different from that induced by it on an input in 
which occurrences of an ID j are replaced with some other ID k. Recently, this notion was extended to 
pan-privacy where the computation of / retains differential privacy, even if the internal memory of the 
algorithm is exposed to the adversary (say by a malicious break-in or by fiat by the government). This is 
a strong notion of privacy, and surprisingly, for basic counting tasks such as distinct counts, heavy hitters 
and others, Dwork et al [4] present pan-private algorithms with reasonable accuracy. The pan-private 
algorithms are nontrivial, and rely on sampling. 

We reexamine these basic counting tasks and show improved bounds. In particular, we estimate 
the distinct count to within (1 ± e)D^' ± O(polylogm), where m is the number of elements in 
the universe. This uses suitably noisy statistics on sketches known in the streaming literature. We also 
present the first known lower bounds for pan-privacy with respect to a single intrusion. Our lower bounds 
show that, even if allowed to work with unbounded memory, pan-private algorithms for distinct counts 
can not be significantly more accurate than our algorithms. Our lower bound uses noisy decoding. For 
heavy hitter counts, we present a pan private streaming algorithm that is accurate to within O(k) in 
worst case; previously known bound for this problem is arbitrarily worse. An interesting aspect of our 
pan- private algorithms is that, they deliberately use very small (polylogarithmic) space and tend to be 
streaming algorithms, even though using more space is not forbidden. 
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1 Introduction 



Consider updates arriving online in which the tth input is (it,dt). Define input St as the first t updates, 
i.e. (ii, d\), . . . , (it, dt)', it's are IDs of users from Universe U of size m. An example is to think of this as a 
"traffic log" where it is the ID of a user, i t £ U and dt is the time spent by the user at a particular website 
of interest; another example is to think of the input as a "payment log" where it is the ID of a merchant, 
it G U and dt the transaction value, which may be positive for sales and negative for refunds. A user may 
visit the site many times and a merchant may have many transactions; hence, same it's may be seen several 
times. It is of great interest to maintain various statistics on such logs. For example, 

• distinct count, , is the number of distinct ij's seen before th update; 

• heavy hitters count HH(k), informally, is the number of i's that have large total d, Sj<t|i t =i^' ( a 
precise definition is presented later); 

• rarity ratio ryk) = ; 

• frequency moment F k = Y, ieu (Ej<t|i i= j d j) 



and others. Normally, these statistics are trivial to maintain with an array a of size m (a, = Ylj<t\i =i 4?')> 
and some basic bookkeeping. These statistics - in one form or the other - have a long history, and are 
considered basic in data analysis tasks over the past few decades. 

Our focus is on privacy, that is, how to maintain these statistics and still preserve the privacy of IDs 
involved. There are two concerns: 

• What if the output reveals something about the IDs? For example, an adversary might estimate 
first, and then insert an (i, 1) before determining D t +\ which will surely reveal if i was already in the 
input prior to t. Likewise one can devise insertion and query strategies that will reveal information 
about various IDs from other statistical queries. 

• What if the adversary gets access to the system and sees the internal memory used by the algorithm? 
This might happen not only with intruders but may even be the outcome of a legal request which 
will force us to reveal all the stored information. In this case, with the trivial solution, ai will end up 
revealing information about i. Of course, one could hash (encrypt) IDs and index in the hashed space. 
But when the memory is compromised, the hash (encryption) function will get revealed and will let the 
adversary decode by enumerating IDs. Often, sampling algorithms are used for providing statistical 
estimates, but these are vulnerable because when the internal memory is revealed, the sampled IDs 
are compromised. 

To overcome the first concern, we can adopt the notion of differential privacy [B] . Let S' t be the updates 
derived from St by replacing some occurrences of some ID j with occurrences of some other ID k. Informally, 
a randomized function / is differentially private with respect to the IDs if the probability distribution induced 
by f(St) on the range of / is not much different from that induced by f(S' t ) for any S' t as defined above, and 
any t. For the first 3 statistics listed above, using known techniques, it is straightforward to get differentially 
private estimates; for frequency moments, one can look at a related function cropped frequency moment 
Tfe(r) = X^iew mm {(Sj i -=i dj) k , t} that bounds what is known as the sensitivity of the function and get 
differentially private estimates. 

The authors in [3] initiated the study that addresses the second concern above. In particular, they 
defined the notion of pan-privacy. Informally, St and S' t should produce very similar distributions on both 
internal states as well as outputs. Without some "secret state" it might seem impossible to estimate statistics 
privately, but [4] showed that some of the statistics above can be estimated accurately. Their main results 
were for streaming algorithms, that use space polylogarithmic in m and other parameters. In particular, they 
showed pan-private streaming algorithms for rarity ratio, distinct count, cropped mean T\ and a version of 
heavy hitters. 

We are inspired by this work 4 a and this emerging direction of pan-private algorithms [5] to revisit these 
problems. There are some outstanding fundamental questions: 
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• Is there a cost to pan-privacy, that is, are there problems for which pan-privacy provably needs more 
resources or loss of accuracy, compared to just differential privacy? 

• What is the impact of memory in pan-privacy? Since the memory used by the algorithm may get 
revealed to an adversary, do pan-private algorithms use very small memory like in the streaming 
algorithms of 0], or can they use large memory to better encode information about the input and get 
better accuracy? 

• Technically, [3] used samples and adapted techniques from randomized response [13] such as distorting 
counters with random shifts or using two distinct distributions. In contrast, in streaming [5], some of 
the most powerful algorithms use sketches that are linear projections of data along random directions. 
Do sketches provide improved or richer pan-private algorithms? 

Our Contributions We address these questions and make the following main contributions. We focus on 
the basic model of pan-privacy as formulated by [1] where memory may be breached by an adversary once 
unannounced to the algorithm (and later comment on the variants of the model). 

• Distinct Counts. We present a streaming algorithm that is e-pan private and outputs an estimate 
(l+e)D( t '±polylog(rri). It directly uses sketch known before based on stable distributions for estimating 
distinct counts [2], but maintains noisy versions. In fact, this approach is powerful and our pan-private 
algorithms even work for turnstile streams where dj's may be negative, the first pan-private algorithms 
to have this property. In contrast, best previous result for pan-private streaming estimation of distinct 
count outputs an estimate D^' ±am with constant probability, for only nonnegative updates 0J. Note 
that stable distribution based approach is known to yield streaming algorithms for Ft for < k < 2 
[5], but this analogy does not work to get pan-private estimate of Fk (adapted as Tfc); therefore, that 
it works for pan-private k — (which is related to distinct counts) is very interesting. 

We complement this result by showing lower bounds. Let A be an online (not necessarily streaming) 
algorithm that on input St outputs D^> ±o(y / m) with small constant probability. Then A is not e-pan 
private for any constant e. This is the first-known lower bound for any pan-private algorithm in this 
model. In fact, we develop an approach to showing lower bounds (which may be of independent interest 
in the future) that takes a copy of the memory by breaching the algorithm once, and then simulating 
the algorithm with random inputs in parallel with this seed memory like noisy decoding [3] . Our lower 
bound holds no matter the memory used by A, even if the memory is f2(m). Thus, A need not be 
a streaming algorithm. Our lower bound is not like the ones in streaming literature where the lower 
bound is conditioned on using small space, or like in differentially private optimization |12) where one 
shows structural relationship between "near" configurations of inputs. We show this lower bound is 
essentially tight if A is not streaming: we show a simple pan-private algorithm that outputs an estimate 
D^> ±o(y / m) with constant probability and maintains 0(m) memory. Further, we show a lower bound 
of (1 + e)D^ t ' ±polylog(m), essentially tight upto additive polylog terms with our streaming algorithm. 

• Heavy Hitters Count. As is standard in streaming literature, we define Fffl''' (fc) as the number of IDs i 

with y~)j\j dj > f[ 1 ^ /k. In this notation, [4] approximates HH(fc) within an additive error of 0(am) 

for any constant a. However, m can far exceed k which is an upper bound on HH(fc). 

We present a pan-private streaming algorithm that returns an estimate in [(1-e) BH(k)-0(Vk), HH(0(fc 2 ))+ 
0(y/k)] (that is no worse than 0(k) approximation, upto additive errors), which is a significant im- 
provement over [4]. We obtain this by first observing that with 0(m) space, we can provide an estimate 
HH(fc)±0(y / TO), and then using this only on the space of all buckets in the Count-Min sketch [T] which 
uses much smaller space. 

Some comments: (1) Both of our results above are obtained using sketches, which is different from use of 
samples thus far 0]. Also, we use full space versions on top of sketches to get best-known accuracies. (2) An 
interesting aspect of our pan-private algorithms is that, they deliberately use very small (polylogarithmic) 
space and are streaming even though using more space is not forbidden. (3) Our insights from above yield 
other upper bounds (pan-private streaming algorithms for T^, inner products of vectors etc) and lower bounds 
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(inner products). (4) We are adapting pan-privacy model from [1] as a given and refer the readers to that 
original work for motivating and defending the model as well as discussion related to the model such as, 
what if a small amount of secret storage is allowed, or what if adversary is allowed to look at the memory 
multiple times or even continually and so on. For the purposes of this paper, the basic pan-privacy model 
is of great interest and there are fundamental techical problems that we address. (5) Likewise, the specific 
statistics we have considered have many applications that have been identified over the past decade from 
databases to data streams, compressed sensing and beyond [9]. We do not elaborate on this further, instead 
addressing how these problems can be solved. (6) Finally, we have focused on counts throughout. Many of 
these problems have a corresponding "list" version in which output comprises specific IDs. We have left it 
open to identify suitable pan-private versions of these problems. 

Map. In Section 2, we introduce relevant definitions and notation. In Section 3, we present our upper and 
lower bounds for distinct count estimation. In Section 4, we present our upper bound for heavy hitter count 
estimation. In Section 5, we have concluding remarks with other extensions. 

2 Preliminaries 

2.1 Definitions and Notation 

We are given a universe U, where \U\ — m. An update is defined as an ordered pair (i, d) £E U x Z. Consider 
a semi-infinite sequence of updates (i±,di), (£2, cfe), • • .; the input for all our algorithms consists of the first t 
updates, denoted St — (ii, d\ ),..., (it, dt). The state after t updates is an m-dimensional vector a('), indexed 
by the elements in 11 (we will omit the superscript when it is clear from the context). The elements of the 
vector a = a^, referred to as the state vector, are defined as follows: 



We consider two models: the cash register model in which all updates are positive, i.e. Vj : dj > 0, and 
the turnstile model in which updates can be both positive (inserts), i.e. dj > 0, and negative (deletes), 
i.e. dj < 0. We note that the turnstile model has not been considered in pan privacy before. 

Our algorithms output a real number which approximates one of the following statistics on St: 

• distinct count: D = D^> = \{i 6 U : a* ^ 0}|; 

• fc-th frequency moment: Fk = F^ — \ a i\ k ■ This coincides with the norm, ||a||fe of the state 
vector a and we will use either terms to facilitate exposition. 

• fc-th cropped frequency moment: Tf.(r) = T^\t) = ^2 ieU min{|ai| fc , r}. 

• cropped dot product: Given two sequences of updates St and S' t with state vectors a and a', the cropped 
dot product is (a • a')(r) = Y^ieu min{aja^, t}. 

• k-heavy hitters count: HH(fc) = HH (t) (fc) = \{i : \a z \ > F^/k}\. 
2.2 Differential Privacy 

Dwork et al. [6] introduce the concept of differential privacy which operates on a data set consisting of rows 
of data, where each row consists of the data of an individual. Differential privacy provides a guarantee that 
the probability distribution on the outputs of a mechanism is "almost the same" , irrespective of whether an 
individual opts in to, or out of, the data set. Such a guarantee incentivizes participation of individuals in a 
database by assuring them of incurring very litle risk by such a participation. Formally, 

Definition 1 (f6J. A randomized function f provides e- differential privacy if for all neighboring (differing 
in at most one row) data sets D and D' , and all Y C Range(f), 




Pr[/(L>) e Y] < exp(e) x Pr [/(£»') e Y}. 
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One mechanism that [5] use to provide differential privacy is the so called "Laplacian noise method", 
which depends on the global sensitivity of a function: 

Definition 2 ( [6]). For f : T) — >• M. d , the global sensitivity of f is 

GS f =msK\\f(D)-f(D')\\ 1 

for all neighboring data sets D and D' . 

The Laplace distribution with mean and scale parameter b, denoted Lap(b), has density function p(x) = 
7^exp(— |ar| /b). The following theorem from [5] uses the Laplace distribution to construct a differentially 
private mechanism: 

Theorem 1 ( 6 ]). For f : T> R, mechanism A4 that adds independently generated noise drawn from 
Lap{GS f I ' e) to the output preserves e- differential privacy. 

2.3 Pan-privacy 

While differential privacy provides meaningful gurantees to mitigate the risks of an individual being identified 
by particpating in a data set, individuals might also be concerned about retaining similar guarantees even if 
the internal state is revealed, say, because of a subpoena. Mechanisms that achieve this property are called 
pan-private [4]. Pan privacy guarantees a participant that his/her risk of being identified by participating 
in a data set is very little even if there is an external intrusion on the data. Formally, consider two online 
updates S = {(«i, di), . . . , (i t ,d t )} and S' = d[), . . . , (i' t ,,d' t ,)} associated with state vectors a and a' 
respectively. 

Definition 3. S and S' are said to be neighbors if there exists a (multi)set of updates in S indexed by 
K C [t] that update the same ID i G U, and there exists a (multi)set of updates in S' indexed by K' C [t'\ 
that updates some i) £ U such that ^2 keK dk = ^2keK' ^'k an d f or a ^ other updates in S and S' indexed 
by Q = [t] — K and Q' = [if] — K' respectively, 

k£Q,s.t. ik—i k£Q\s.t. i' k —i 

Notice that in the definition above t and if don't have to be equal because we allow the dj's to be 
integers. The definition ensures that two inputs are neighbors if some of the occurrences of an ID in S is 
replaced by some other ID in S' and everything else essentially stays the same except (a) the order may be 
arbitrarily different and (b) the updates can be broken up since they are not constrained to be l"s. The 
neighbor relation preserves the first frequency moment of the sequence of updates, considered to be public 
information. Also, the graph induced by the neighbor relation on any set of sequences with the same first 
frequency moment is connected. 

Definition 4 (User level pan-privacy |4 ). Let Alg be an algorithm. Let I denote the set of internal states 
of the algorithm, and let a the set of possible output sequences. Then algorithm Alg mapping input prefixes 
to the range I x a, is pan- private (against a single intrusion^ if for all sets I' C I and a' C a, and for all 
pairs of user-level neighboring data stream prefixes S and S' 

Pr[Alg(S) e (I', a')} < e £ Pr[Alg(5') G (/>')] 

where the probability spaces are over the coin flips of the algorithm Alg. 
1 See [4] for discussion about multiple intrusions. 
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3 Distinct Count Estimation 



In this section we present upper and lower bounds for the problem of pan-private estimation of the distinct 
count statistic D^, We utilize a sketching approach based on a stable distribution. In contrast with the 
sampling approach of Dwork et al. [1] , the sketching approach works in the more general turnstile model and 
for the usual range of achieves significantly better accuracy. We present our algorithm for distinct count 
estimation as evidence of the usefulness of the sketching approach for designing pan-private algorithms. 

We compliment our upper bound with lower bounds based on noisy decoding. Our results present the 
first lower bounds against pan-private algorithms that allow a single intrusion. 



3.1 Upper Bounds 

Consider the turnstile model where the dj's could cither be positive or negative, and assume an upper bound 
on the absolute value of each element of the state vector: Vi € U, |a,| < Z. We are interested in a pan- 
private computation of = {i\aW[i\ ^ 0}. Note that where the superscipts don't appear a time slice of t 
is implicit. Recall that the L p norm of a vector a is ||a|| = l a i| P ) 5 ■ 



3.2 Prior Approach in Streaming Algorithms 

[2] show that, for sufficiently small p (0 < p < ej log Z) 

D (t) < ^k| p < (1 + e)£>«. (1) 

i 

Hence, it suffices to estimate the L p norm of a for certain small p for estimating the distinct counts. For 
this purpose they use what are called stable distributions. 



Stable distributions and their use in sketches A distribution V over R is said to be p-stable, if there 
exists p > such that for any n real numbers b±, . . . , b m and i.i.d. variables Y±, . . . , Y m with distribution V, 
the random variable ^ biYi has the same distribution as the random variable (^2 i |6i| p ) 1 / p F, where Y is a 
random variable with distribution V Let X be a matrix of random values of dimension m x r, where 
each entry of the matrix Xij, 1 < i < m, and 1 < j < r, is drawn independently from a random stable 
distribution with parameter p, with p as small as possible. The sketch vector sk(a) is defined as the dot 
product of matrix X T with a, so 

m 

sk(a)j = ^ x i,j a i = x j ■ a > 

i=l 

where Xj is a m-dimensional vector composed of the following elements: {X\ t j, x 2j, ■ ■ ■ X m< j). 

From the property of stable distributions we know that each entry of sk(a) is distributed as (J^. |ai| p ) 1//p Xo, 
where Xg is a random variable chosen from a p-stable distribution. The sketch is used to compute J^. |ai| p 
for < p < e/ log Z, from which we can approximate up to a (1 + e) factor. By construction, any sk(a) J 

can be used to estimate L v p . [2] obtain a good estimator for |«i|) p by taking the median of all entries 

p 

sk(a) over j: 



Lemma 1 ( 2J. With probability 1 - S if r = (9(l/e 2 • log(l/5)), 



(1 — e) p median, sk(a) • < median |Xo| p (V^ |Qi| P ) < (1 + e) p median :) sk(a) 

i 

where median |Xo| p , is the median of absolute values (raised to the power p) from a p-stable distribution. 

Using the results of Equation [T] and Lemma Q] Cormode et al. [5] prove that: 

Theorem 2 {[2\). The computation of a sketch sk(a) of online data described by a state vector a that 
requires space 0(l/e 2 . log(l/<5)) allows an approximation of within a factor of 1 ± e of the true answer 
with probability 1 — S. 
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Maintaining the sketch under updates As updates arrive, the sketch vector is built progressively. It 
is initialized to be the zero vector, and on receiving tuple (i, dk), the update is done by adding dk times Xi t j 
to each entry sk(a) Vj G [r] of the sketch vector. That is, 

Vj G [r] : sk(a) j <- sk(a) j +d k X h3 . 

In order to avoid percomputing and storing all the values Xij Cormode et al. [2J generate the random 
variables X^j from a stable distribution on the fly by using % to seed a pseudo-random number generator 
randomQ. These pseudo-randomly generated numbers are then used to generate a sequence of p-stable 
distributed random variables using a (deterministic) function stable(ri, r%,p), where n and r-i are pseudo- 
random variables in the range [0 ... 1] drawn from randomQ . The function is defined as follows: first define 
a quantity 9 = w{r\ — 1/2). Now, 

i-p 

„ \ sinpd / cost 9(1 - p))\ " 
s t M e{l/2 + e,r 2 ,p) = —? rg l < LL^ j . 

Since each time the same seed i is used, this ensures that X^j — stable(ri,r 2 ,p) takes the same value each 
time it is used. We will find this technique useful for our own purpose of precomputing the global sensitivity 
of a sketch in the next section. 

3.3 Pan-Private Algorithm 

To get pan-privacy, we maintain these (approximate) sketches in a differentially-private way. In particular, 
we maintain a noisy sketch vector where each element of the sketch vector has noise added according to the 
sensitivity method of [BJ. 

Adding Laplacian noise to the sketches. The global sensitivity of a sketch sk(a) ■, (GSj) from Defini- 
tion [2] is 

GS j =2.Z\\X j \\ 00 . 

Consider state vectors a and a' corresponding to two neighboring sequences of online updates S and S' 
respectively. From Definition [3] there exists some i G [n] and some k =/= i G [n] , such that some occurences 
of i in the sequence of updates in S is replaced by some occurences of k to get S'. This means that =/= a\ 
and a k ^ a' k , and for any other / not equal to i or k, ai — a\. So, for any neighboring S and 5', 

\\ x i ■ a ~ x o ■ a 'lli < \ x i,j a i ~ x tJ a t + X k^a k - X k>j a' k \ <2-Z\\X J \\ oci . 

From [6J , it will follow that we need to add Laplacian noise based on this sensitivity to have a differentially 
private description of the state at any point, which is pan-private with respect to a single intrusion. Since the 
elements of Xj are random quantities independent of the data, we can compute the norm of the actual 
vector that we end up using without compromising on privacy. However, the challenge is that H-Xj'H^ is not 
known in advance. The use of index i to seed the pseudorandom generator and use of the pseudrandomly 
generated values to generate the Xjj's, means that this challenge can be solved by computing ||-Xj-|| , Vj 
before the onset of our algorithm (shown in Algorithm [T]) . Also to use the result of Lemma U the value 
of median |A" | P , the median of absolute values from a p-stable distribution, needs to be computed. This is 
also done numerically in advance in [2] , and then the final result is scaled by this constant factor denoted as 
sf(p). 

Algorithm [T] modifies the algorithm in [2] by maintaining a-differentially private sketches of the stream 
vector a. 

Each sketch is initialized with a noisy value drawn from the appropriate Laplace distribution. Formally, 
let sk(a) p " t ' = sk(a)j +r/j where r/j is a random variable drawn from a Laplacian distribution with mean 
and scaling factor of GSj /a. Here a is the privacy parameter. Since we maintain r sketches of the data, 
Algorithm [T] gives us an overall privacy of a' = ar as per the composition theorem [BJ : 

Theorem 3 ([6j). Given mechanisms Mi , i G [r] each of which provide at -differential privacy, then the 
overall mechanism Ai that consists of a composition of these r mechanisms, provides ^X)j£[r] a *) -differential 
privacy. 
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Algorithm 1 Pan-private approximation of 



INPUT: privacy parameter a, < p < e/Z < 1 
noise vector rj, where rjj 



\Xj\\ Vj G [r] computed off-line, an r-dimensional 



2 1 \JC II Z 

Lap( J ' °° ), sf(p) = median |Xq| p also computed off-line numerically. 



initialize the r-dimensional sketch vector sk(a) p ™, such that sk(a.) P " v 
for all tuples (i,d t ) do 

initialize random with i 

for all j = 1 to r do 

rl = randomi) 
r2 = randomQ 



Vj 



sk(a)f- = sk(a) j 



h 

end for 
end for 

return T> — median 



priv 



-d t * stable(rl, r2,p) 



iaiij ^ sk(a) 



priv 
3 



sf{p) 



Our main result for distinct count estimation is to prove that T> returned by Algorithm Q] provides an 
a'-diffcrcntially private approximation of : 

Theorem 4. With probability 1 — (r + 1)6, Algorithm^ computes an a' -pan-private approximation T> of 
such that 

(l-e)D®-0 (poly Mog(m) • (1 + e) M^)) < V < (l+e)D^+0 (poly Mog(m) • (1 + e) log(^) J 

We will need Claim [IJ and Lemmas [3] and [2] for this purpose: 

Claim 1. For am/ two real numbers x and y and for any p G [0, 1), we have 

\x\ p -\y\ p <\x + y\ p <\x\ p + \y\ p 

Proof. First, assume x and y are either both positive or negative. For any x, y G M + consider functions 
9x, y (p) = x p +y p and f x , y (p) = ix+y) p ■ Atp = l,Vx,y G M + , the two functions intersect as x 1 +y 1 = (x+y) 1 . 
At p = 0, g x ,y{p) > fx, yip)- We want to prove that for p G [0.1),<7 Xi j,(p) > f x ,yip)- For convenience, we drop 
the subscript x, y. 

WLOG assume x > y, then /(p) = x p (l + |) P and g(p) = x p (l + So, 

/(p) _ (i + 1 y 

9{V) 1 + (I ) P 

(l + -Y < 1 + -, for p G [0, 1), V - < 1 

V x) X X 

1 + f — V > 1 + -, for p G [0, 1), V - < 1 

\xJ X X 

, /(p) r 1+1 1 
ff(p) 1 + 1 

So for any x, y G M + , we have f x , y {p) < 9x,yip), for p G [0,1). Similarly, assume x is postive and y is 
negative, and WLOG assume a = \x\ > \y\ — b. Then |x + y\ = a — b, and we can similarly prove that for 
a,beR + , 

(a - b) p >a p -W 



The numerator 



The denominator 



□ 
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Lemma 2. With probability 1 — 6, for any j € [r], with < p < e/Z < 1 



sk(a) 



s k( a ; 



priv 
3 



< 



jkfa), 



where £ = 



2 • Zmaxj ||X 



' 50 log(^ 



Proof. We have 



sk(a)f ra 



sk(a).+r?j 



From Claim [TJ we have: 



sk(a) 



< 



sk(a 



priv 
} 3 



< 



Jk(a), 



Also since ^ is drawn from a Laplacian distribution, we know that with probability 1 — 5, \r)j \ < ^^l.log(i) < 

2Z maXjllXjH^ _ 1 / 1 -j 



□ 



Since Algorithm Q] computes T> by taking the (scaled) median of the sk(a)^""™'s and Lemma [T] relates the 
median of the sk(a) .'s to the \ai\ p , we need to bound median, sk(a)^ TO in terms of median,- sk(a) ,. 

Lemma 3. Let x\, . . . ,x r and y\, . . . , y r be two sequences of real numbers satisfying Vi : Xi — E < yi < Xi + E. 
Then 

median.^ Xi — E < median; yi < median^ X{ + E. 

Proof. Assume, WLOG, that sorted in increasing order and median^ Xi = x \ r /2~\ ■ Let medianj yi 

yj. We will prove that yj > Xr r /2] — E, and the other side of the inequality will follow by an analogous 
argument. 

If j > r r /2] , then yj > Xj — E > aj|>/2] ~~ E. Therefore, we may assume j < \r/2 \ . Because yj has rank 
|Y/2] in y±, . . . , y r , there exist indices fci, . . . , kd > j, where d = [r/2] — j, s.t. y^ x , . . . , yu d < Uj- At least 
one of fci, . . . , kd is greater than or equal to [r/2] ; let the smallest such index be I. Then we have, 



Uj >Ue> xe- E > x^ r/2 -\ - E. 



Now we prove that t>, returned by Algorithm [T] gives a good approximation to D^': 



□ 



Lemma 4. Algorithm [7] computes an a' -pan private approximation of using space 0(l/e 2 log(l/5)). 
The approximation guarantee is: With probability at least 1 — (r + 1)6 and with £ as in Lemma® 



(1 - e)D® - £ • sf (p) < 2? < (1 + e)D^ + $ • sf (p) 



(*) 



where r — 0(1/ e 2 ■ log l/<5). 



Proof. Since each sketch sk(a)^ is a-differentially private according to the sensitivity method of [5], and we 
have r such sketches, the over all privacy of the Algorithm is ar = a' . Each sketch is a differentially private 
description of the state and hence the algorithm achieves a' pan-privacy. Nowe we prove the approximation 
guarantee: 

We have V = median, ; sk(a)^ 1 ™ -sf(p). Using Lemma [2] we have with probability at least 1 — r • 6 Vj 
simulatenously : 



sk(a) 



sk(a 



priv 
3 



<|sk(a)/ + £. 



From Lemma [U we have with probability at least 1 — rS: 
So we have with probability 1 — r6 



median, 



sk(a) 



£ < median. 



sk(a 



priv 
3 



< median. 



sk(a) 



Using Lemma [T]and Eqaution Q] and noting that a' = ar, the result follows. 



□ 



9 



Since p < e/logZ < 1, and r, the number of sketches is polylogarithmic in m, and sf(p) is a constant, 
from Lemma |4l we have: 

Theorem 5. Wit/i probability 1 — (r + 1)5, Algorithm^ computes an a' -pan-private approximation T> of 
suc/i that 

\)^)) <-D<(l+e)D^+0 (poly flog(m) ■ (1 + e) log(^ 



(l-e)I»W-0 (poly (log(m) • (1 + e) log(-)- ) ) < P < (l+e)I>W+0 ( poly ( log(m) ■ (1 + e) log(-)- 
Proof. We have, 



and Z p < e e , which for small e is less than (1 + e). Since r is polylog in m and maxj is a constant, 

the result follows. 

□ 

In fact, this algorithm is a streaming algorithm since it stores polylogarithmic in m space and takes time 
polylogarithmic in m per new update. Technically, it works in the turnstile model since dj may be positive 
or negative, the first such pan-private streaming algorithm [5]. 

The best previous result for pan-private distinct count estimation is due to Dwork et al. [4]. Their 
algorithm outputs and estimate in [Z?W — a'm,D^ 4- a'm] with probability 1 — 5 for any constant a and 
5. By extending their techniques and running their algorithm in full space, we can get an estimate in 
— 0(y / m),Z)w + 0{y/m)\ with constant probability (see Section[4|. Our sketching algorithm achieves 
a significantly smaller error whenever = o(y/m); we note that in practice the distinct counts statistic is 
usally much smaller than the size of the universe. 



3.4 Lower bounds 

Next we present lower bounds against pan-private algorithms that allow a single intrusion. These are the 
first such lower bounds in the literature and may be of independent interest. 

We show that if only an additive approximation is allowed, the full space extension of Dwork et al.'s 
algorithm for distinct count estimation, as presented in Section 21 is optimal. Thus, the multiplicative 
approximation factor in the analysis of our sketching distinct counts algorithm is necessary. Furthermore, 
by proving a new noisy decoding theorem, we show that our sketching algorithm gives an almost optimal 
bi-approximation guarantee. Interestingly, our lower bounds make no assumptions on the space complexity 
of the algorithm, and yet the (almost) optimal algorithm happens to use polylogarithmic space. 



Dinur-Nissim Style Decoding Our lower bounds utilize a decoding algorithm of the style introduced 
in a privacy context by Dinur and Nissim J3J. Informally, we argue that the (private) state of an accurate 
pan private algorithm can be used to recover the majority of the algorithm's input. First, we introduce the 
decoding results we will use. 

Theorem 6 ([3j). Let x £ {0, 1}™. For any e and n > n e> the following holds. Given 0(n\og 2 n) random 
strings qi, . . . , q t £r {0, 1}™, and approximate answers a 1; . . . , a f s.t. Vi £ [t] : |x ■ q; — a,| = o(y/n), there 
exists an algorithm that outputs a string x £ {0, 1}™ and except with negligible probability ||x — x||o < en. 

In follow up work, [7] strengthened the above and showed that decoding is possible even when a constant 
fraction of the queries are inaccurate. 

Theorem 7 ([7]). Given p < p* , where p* is a constant approximately equal to 0.239, there exists a constant 
e s.t. the following holds. Let x £ {0, 1}™. There exists a matrix A £ {—1, l}" xm for some m — 0(n) and 
an efficient algorithm A, s.t. on input b £ N m , satisfying \{i : |(Ax — b)i\ > a}\ < p, A outputs x £ {0, l} n 
and with probability 1 — e~°^ m ' , ||x — x||o < ea 2 

Next we will prove a result that is similar to Dinur and Nissim's but uses "union queries" as opposed to 
dot product queries. 
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Theorem 8. Let x G {0,1}™, ||x||o < Clog c n for some constants c and C. For any e and n > n e the 
following statement holds. There exists n °( l °a Cn ) binary strings qi,...,q t € {0,1}™ and an algorithm A 
such that given answers ai , . . . , ai satisfying 

Vz : (l-Q!i)||x + qj||o-Q!2 < a; < (1 + ai)||x + q 4 ||o + a 2 

for a 2 = o(log c n), A outputs x with ||x — x||o < C log c n. 

Proof. Let L be an upper bound on ||x||o, i.e. L = C\og c n. The set of queries is qo = (0, ...,0), and 
qi, . . . , q t are the indicator vectors of all subsets of [n] of size at most L. The algorithm outputs any string 
x s.t. ||x||o < L and x satisfies all the following constraints: 

Mi : (1 - ai)||x + q»| |o - a 2 < di < (I + ai)||x + qj||o + a 2 

Clearly the algorithm terminates, as at least one string, i.e. x satisfies all constraints. Choose e so that 
a 2 < eL. Next we argue that if ||x - x|| > 1 -^-±^ L, at least one of the above constraints is violated. 
We will consider several cases. Let b — L. Assume first that ||x|| — ||x||o > b. Then, 

d > (1 - ai)||x|| - a 2 

> (1 -ai)(||x|| + b) -a 2 

> (1 + ai)||x|| + a 2 - (2ai||x|| + 2a 2 - (1 - a^b) 

> (1 + ai)||x|| + a 2 - (2(ai + e)L - (1 - ai )b) 

> (l + ai)||x||o + a 2 . 

We have shown that a constraint is violated by x in this case. The case ||x|| — ||x|| > bis argued analogously. 

Finally, assume that ||x|| — ||x||o € [—6,6]- Let q' be the indicator vector of the set {i : x^ = 0, x^ = 
1}, and, similarly, let q" be the indicator vector of the set {i : x, = l,x, = 0}. Since, by assumption 
||x-x|| = ||q'||o + ||q"||o > 16 1 ( ° 1 a + 6) £, it follows that max(||q'|| , ||q"||o) > ^r^f L - Assume, without 
loss of generality, that ||q'||o > 8— We have the following identities: 

||q' + x|| = ||x|| + ||q'||o 

, , 4(ai + e) 

q +x o < x o + q' o - -L 

1 — Gti 

llq'+xHo-llq'+xHo^ A{ai+e) L-b 

1 — Ctl 

Let a be the approximate answer to the query ||q' + x|| . 
d> (l-ai)||x + q / || -a2 

> (1 - ai)(||x + q'||o + 26 - 8 [ ai+e h ) - a 2 

1 — cti 

= (1 + ai)||x + q'|| + a 2 - (2ai||x + q'|| + 2a 2 + 2(1 - ajb - 8(«i + e)L) 

> (1 + ai)||x + q'||o - a 2 - {2aiL + 2eL + 4(ai + e)L - 8(ai + e)L) 

> (l + ai)||x + q / ||o-a 2 

Therefore, the constraint is violated and this completes the proof. □ 



Lower Bounds from Noisy Decoding We introduce our approach to proving lower bounds for pan- 
private algorithms using the most direct argument first: a lower bound against dot product. We introduce 
the problem first. 

Problem 1. Input is a sequence of updates St followed by a sequence S' t . 

Output: Let a be the state of sequence St , and let a' be the state of S' t . Output a • a' ± a = J2ieu a i a i± a i 
where a is an approximation factor. 
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Theorem 9. Let A be a streaming algorithm that on input streams St, S' t outputs a • a' ± o{^/rn) with 
probability at least 1 — 0{m~ 2 ). Then A is not e-pan private for any constant e. 

Proof. Fix a stream St s.t. Vi £ U : a, £ {0, 1}. Let the internal state of the algorithm A after processing St 
be X. By the definition of pan privacy, / is e-differentially private with respect to St- Fix some constants 5 
and r). We will show that for all large enough m, any algorithm Q that takes as input X and a stream S' t 
and outputs a • a' ± o(y/m) with probability at least 1 — 0(m~ 2 ) can be used to recover <n exactly for all 
but an rj fraction of i £ U with probability 1 — 5. Therefore, the existence of such an algorithm Q implies 
that X cannot be e-differentially private for any fixed e. Indeed, assume for the sake of contradiction that 
an algorithm with the given properties exists and X is e-differentially private. Since Q depends only on X 
and not on S t , the output of Q is also e-differentially private. This is a contradiction, since the output of Q 
can be used to guess a bit of the binary vector a accurately with probability at least (1 — 25 — 77), where S 
and rj can be chosen arbitrarily small. 

To finish the proof we show that an algorithm Q with the specified properties can be used to recover all 
but an n fraction of a with probability 1 — 5. To see this, observe that Q can be used to answer queries 
a • q for any arbitrary q to within o(y / m) additive error. In particular, to answer queries a • qi, . . . , a • q r , 

run Q(X, S[ ), . . . , Q(X, S^) in parallel, where Sf is a stream with state q^. If r = o(n 2 ), then, by the 
union bound, with probability 1 — 5 for any constant 5, Q{X, S^ ) = a - q$ ± o(y / m). By Theorem El there 
exists an algorithm that, given the output of Q(X, S t ), . . . , Q(X, S^), outputs a s.t. except with negligible 
probability a agrees with a on all but rj fraction of the coordinates. □ 

Notice that the lower bound relies on the fact that the updates for St arrive before any of the updates of 
S' t . This restriction can be relaxed. In general, we get a lower bound of O(ymo) for the additive error, where 
mo is the largest number of items in S that are updated before any of the corresponding items in S' . The 
lower bound is interesting whenever the updates to the two sequences of updates are not "synchronized" , 
i.e. (i,d) £ St {i,d') £ S' t for the same i € U are allowed to arrive at different time steps. 

Recall that the distinct count for St is We have the following corollary. 

Corollary 1. Let A be an online algorithm that on input St outputs ± o(y / m) with probability at least 
1 — 0{m~ 2 ). Then A is not e-pan private for any constant e. 

Proof. Notice that the proof of Theorem |H] goes through if we restrict the instances to be binary, i.e. if 
we require that Vi £ [m] : a, a' £ {0,1}. The corollary follows by a reduction from this restricted dot- 
product problem to the distinct elements problem. Given binary streams S' t , S' t ' , let St = (S' t , S' t ') be their 
concatenation. By a simple application of inclusion-exclusion, D^> — (5") + D^' (S") — a • a' . Therefore, 
an e-pan private algorithm for that achieves additive approximation a with probability 1 — 5 implies a 
3e-pan private algorithm for dot product on binary instances that achieves additive approximation 3a with 
probability 1 — 3(5. □ 

The next two theorems follow by arguments identical to the one used to prove Theorem [9l but using, 
respectively, Theorem [7] and Theorem [8] in place of Theorem [6l 

Theorem 10. Let A be an online algorithm that on inputs St, S' t outputs a • a' ± o(^/m) with probability at 
least 1 — 5. If 5 < p*/2(l + rj) for any rj, then A is not e-pan private for any constant e. 

Proof. The proof is analogous to the proof of Theorem [9l Note first that the {—1,1} queries of Theorem [7] 
can be simulated as the difference of two {0, 1} queries, which gives o(y / m) additive error with probability 
at most 1 — 25. In order to apply Theorem [71 we need to guarantee that at most p < p* fraction of the 
queries answered by Q have error fl(^/rn). Call such queries inaccurate. In expectation there are at most 25 
inaccurate queries. Since the statement of Theorem [Jj holds when the queries are independent, an application 
of a Chernoff bound with a large enough number of queries shows that except with negligible probability 
there are at most p* inaccurate queries. After applying Theorem [Jj the proof can be finished analogously to 
the proof of Theorem [9] □ 

Corollary 2. Let A be an online algorithm that on input S outputs D^(S) ± o(\fm) with probability at 
least 1 — 5. If 5 < p*/6(l + rj), then A is not e-pan private for any constant e. 
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This corollary implies the optimality of the full-space distinct counts estimation algorithm presented in 
Section [4] when only additive approximations are allowed. 

Using similar arguments, we can show the following (proof omitted). 

Theorem 11. Let A be a streaming algorithm that on input a stream St and any constant a outputs 
(1 ± a)L> (t) ± o(log c to) with probability at least 1 - n^ 10 ^' m > . Then A is not e-pan private for any constant 
e. 

The theorem establishes that when an arbitrarily small multiplicative approximation factor is allowed, 
an additive polylogarithmic error is unavoidable for the problem of estimating distinct counts. Thus, up to 
the exact order of the polylogarithmic additive factor, our sketching algorithm for distinct count estimation 
is optimal. 

4 Heavy Hitters 

We provide further evidence for the usefulness of sketching for pan-private algorithms by presenting an 
improved algorithm for the Heavy Hitters problem. As a tool we use a variant of the cropped mean estimator 
from [1] , but we combine it with a sketching approach in the style of CM sketches [T , instead of the sampling 
approach used in [I4|. This will allow us to significantly reduce the approximation error by reducing the 
universe size while approximately preserving the number of heavy items. Once again, we observe that 
polylogarithmic space complexity is a by-product of the improved approximation ratio. 

4.1 Full Space Cropped Sum 

We begin with an analysis of the cropped sum estimator in full space. For completeness we describe the 
estimator. We will approximate t[ (t) for a universe U and a sequence of updates St- 

Let T>o be the uniform distribution over {0, 1} and T>\ be the distribution that assigns probability l/2+e/4 
to 1 and the remaining probability to 0. We compute an estimate Tfc(r) of Tfc(r) as follows: 

• For each j € IA, initialize a counter Cj Gjj {0, . . . r — 1}, a bit bj ~ T>q 

• When item j arrives on the stream, increment the counter Cj (mod r). If Cj = pick bj from T>\. 

• At query time, compute o := \{j : bj = 1}|, and output T(r) = (o — \U\/2)^. 

Note that this algorithm is simply an instantiation of the cropped mean estimator from [3] in full space. 
Keeping counters for each element allows us to guarantee smaller additive error in terms of m. 

Lemma 5. The estimator T^{t) is e- differentially private. Moreover, with probability 1 — 2e~ 2a , 

|T kW -f lW |<WH. 

e 

Proof. The privacy analysis is identical to the privacy analysis of the cropped mean estimator in [3]. By 
the analysis in 0], E[o] = \U\ / 2 + eT k (r) / At , and, therefore, E[ffe(r)] = T fe (r). By Hoeffding's bound, 
Pr[|o - E[o} \ > avWl] < 2 e - 2a . The lemma follows. □ 

Note that setting the cropping parameter t to 1 gives an estimate of the distinct count D in full space 
with 0{^/rn) additive error. 

4.2 HH Algorithm 

The limiting factor in the cropped sum estimator is to. Even though we allow full space to the algorithm 
and it achieves pan-privacy, the approximation guarantees involve an additive factor in to which is large. 
The key step in our algorithm is to project the input S onto S' over a much smaller universe, so that S' has 
approximately the same k- heavy hitters count. In fact, we are able to reduce the universe size to a constant 
that depends only on k and the desired approximation guarantee. The reduced universe size directly implies 
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a more accurate cropped sum estimate and, hence, a more accurate estimate of the number of fc-heavy 
hitters. Next we present our algorithm. 

Assume the value F\ = F^ to \ where to is the time step when the algorithm will be queried, is known 
ahead of time. Assume also we have oracle access to a random function / : [to] — > [h] (these assumptions will 
be removed in Section[5j Given a sequence of updates S, let f(S) be the sequence (f(ii), d%), . . . , (/(it), d t ), 
and let Tk{r\f) and T\(r|/) be, respectively, Tfc(r) and Tfe(r) computed on the stream f(S). Note that f(S) 
is a stream over the universe [h] and can easily be simulated online given the oracle for /. 

• Choose a random function / : U — > [h]. Compute x\ = Tk(Fi/k\f) and x 2 = Tk{Fi/ck\f). Output 



HH(fc) := (a* - x 2 ) 



Fi_Fi 
k ck 



The above algorithm will be accurate provided that the function / approximately preserves the number 
of heavy hitters. In the next section we show that a random / satisfies this condition with high probability. 

4.3 Reducing the Universe Size 

Remember that we denote a< = Y\-. • • gL- . 

Lemma 6. Let f ; U — > [h] be a random function. Also, let k = \{j : 3i € /i _1 (j) s.t. a,; > t/k}\. With 
probability 1 — 5, 

k k 
HH(k) - 1 ~5h' 

Proof. Let the indicator random variable Ij be equal to 1 iff Vi € ^ _1 (j) : fli < t/fc. The expected value of 
7j for any j is as follows: 



E[7j 



1 



HH(k) 



< exp(-HH(k)/h). 



Denote, for convenience, r := h/HH(k). We can write k in terms of Ij-. 

E[k}= ]T (1-Ij)>h(l 



-l/r-s 



E 



HH(k) 



>r{l-e- 1 ' r ). 



Using the inequality e x > 1 + x + x 2 (valid for x G [—1, 1]), we simplify to 



E 



Hff(fc) 



>r(l-l + --i) = l-- 



We can apply Markov's inequality to the random variable (HH(k) — k)/HH(k) > 0. Therefore, with 
probability 1 — 5, 



A- 



HH(k) 



> 1 



1 > 1 k 
Sr ~ Sh 



□ 



In the next lemma we show that we can project the universe onto a significantly smaller universe without 
creating "new" heavy hitters. 

Lemma 7. Let A C U be set of items s.t. Vi G A : 04 < F±S/2k 2 . Also, let f : 14 —> [h] be a pairwise- 
independent hash function. There exists an ho — Q(k), s.t. for any h > ho with probability at least 1 — 6 

Vj G [h] : ai ^ Fl / k - 

ieAnf-m) 
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Proof. Let Nj = X)ieAn/- 1 (i) a «' i- e - -^Y? ^ s ^ ne total frequency of the items mapped to j by /. It's easy to 
see that E[iVj] = 4 SieA a i — F\/h. Let's analyze the variance. Let be the indicator variable for the 
event {i £ Bj}. By pairwise independence, Var(iVj) = J^e^ Var(X^aj). 



Var(X i ,a i ) = E[(X ijai )1 - E[(Jf tf a,-)] 
1 _ J_ 

ft ~ ft 2 



a? I - 



Therefore, Var(JV,-) = £ lGA a 2 . We wil1 denote E ie A a ? as F 2(A). 

Fact 1. // J2 l£A a i < *i a « rf Vj e A : a, < pf x for some p G [0, 1], F 2 (A) = J2 te A <*? < p(^i) 2 - 

Proof. Let a be a vector that maximizes i<2 We may assume without loss of generality that X^ieyi a i = ^i- 
Then either or at least two coordinates in a can be in the open interval (0,pFi). We claim that there exists 
a maximum a s.t. all coordinates are equal to either or pF\. Assume, for contradiction, that there exist i 
and i' in A s.t. < < pF\ and < a^ < pF\. Let a% > . Then changing to aj + 1 and a%> to a,< — 1 
strictly increases F 2 (A) which is a contradiction. □ 

By Fact[TJ F 2 (A) < (F 1 ) 2 6/2k 2 . Set ft > ft = (y/2 + 2)k. By the one-sided Chebyshev inequality, 

Fi 1 



i+(*-f)/a-^)^) 

F 2 (A) < 5 = 5 = S 

a _ El\ 2 ~ 9hb2 (l _ l\ 2 „, A , \ 2 ft/ 



Mt-t) 2^ 2 (i-i) 2ft (l 



\/2+2, 

The lemma follows by a union bound. □ 

We are now ready to analyze HH(fc). The following theorem shows that HH(fc) is in the range [(1 — 
/9)HH(fc) - 0(Vfc),HH(0(fc 2 )j + O(Vk)} with constant probability. 

Theorem 12. HH(fc) can be computed while satisfying It-pan privacy. Moreover, if ft > max{fc//3i5, (-\/2 + 
2)ck}, then with probability 1 — 2(5 — 4cxp(— a) 

(1 - 0) HH(fc) - 4(C + 1) " AA < HH(fc) < HH(2c 2 ^) + ^i+iW* 
(c-l)e (c-l)e 

Proof. The privacy guarantee follows by the e-pan privacy of the cropped sum estimators and the composition 
theorem of Dwork et al. [6] . Next we analyze utility. 

Computing cropped i*i at two levels of the cropping parameter gives us an approximation of the number 
of heavy hitters: 

Ti{Fx/k) - T x {Fi/ck) = ^ min(ATj,Fi/fe) - F x /ck 

j-.Ni> Ft /ck 

= (Fi/k-F 1 /ck)+ (Nj- Fx/ck) 

j:N j >F 1 /k j:F 1 /ck>N j >F 1 /k 

It immediately follows that \{j : Nj > F x /k}\ < E[HH(fc)] < \{j : Nj > Fi/ck}\. By Lcmmai \{j : Nj > 
Fi/k}\ > (1 - /3)HH(fc) except with probability S. We can apply Lemma [7] with A = {i : ai < FiS/2c 2 k 2 }. 
By the lemma, for every j 6 [ft] we have Nj > F\/ck => 3i € s.t. a, > FiS/(2c 2 k 2 ), except with 

probability <5. Therefore, |{j : Nj > Fx/ck}] < HH(c 2 £; 2 /$). We have thus shown that 

(1 - (3) HH(fc) < (TxiFx/klf) - UFx/cklf)) (y - §) < HH(c 2 fc 2 /<5). 
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With probability 1 — 4e 2a , Lemma [5] gives us the following guarantees: 

Pr[\f 1 (F 1 /k\f) - Ti(Fi/fc|/)| > < 2exp(-a) 

PrflTr^/cfel/) - T^/ck)] > -^] < 2exp(-a) 

With probability 1 - 4e 2a , 

\(xi - x 2 ) - (T^/klf) - T^Ft/ck))] < (l+- c 

A straightforward computation and a union bound will complete the proof. □ 

In previous work Dwork et al. [4] present an algorithm that outputs an estimate for HH(fc) in [HH(k(l + 
p)) — a * m, HH(k/(l + rho) + a * m] with probability 1 — 5. Extending their algorithm to full space can 
improve the additive error to 0{\frn) with constant probability. For k — 0(1), which is the usual range for 
this parameter, our algorithm outperforms Dwork et al.'s. 



5 Extensions 

In the following we extend ideas used in the Heavy Hitters upper bound to other problems. We consider 
the cropped second moment and inner product problems which have not been addressed in the context of 
pan-privacy before. The performence of our inner product algorithm matches the lower bound presented in 
Section 13.41 We show how to relax some of the assumptions made in Section [4] 

5.1 Inner Products and T 2 

A simple extension of the cropped sum estimator from Section |4] allows us to estimate the cropped dot 
product of two sets of updates, as well as the cropped second moment of an input. 

Let, as before, T>q be the uniform distribution over {0, 1} and T>\ be the distribution that assigns proba- 
bility 1/2 + e/4 to 1 and the remaining probability to O.We compute an estimate (a • a')(r) of (a • a')(r) as 
follows: 

• For each i G [m], initialize a independentently initialized counters Cj, c[ Gr {0, . . . \fr — 1}, and bits 
'<,.'/ - 'A. 

• When item i arrives as an update in S, increment the counter c; (mod y/r). If q = pick hi from T>\. 
Process the updates in S' analogously. 

• At query time, 

— compute o :~ \{i : bi = b' :i = 1}|; 

— output: 

(T5)(r) = (o - f x (V?)/2 - f{(V^)/2 m/4)if 



Lemma 8. The estimator (a • a')(r) is 2e- differentially private. Moreover, with probability 1 — 6e 



-2a 



(a-a')(r)-(a-a')(r)| < -V— i 



4^ 



Proof. The proof of privacy follows from the analysis of the cropped means estimator [4] . The utility analysis 
is also a simple extensions as follows. 
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By the analysis of [3], for every i £ [m], Pr[6,; = 1] = 1/2 + emin(ai, t/t)/A^, and similarly Pr[6J 
1] = 1/2 + emin(c^, y / r)/4y / r. Since for every i, bi and b\ are independent, we have 

Pr\h - h' - 11 - f 1 4- eminCoi.V^ N / 1 emin«,V?) 
Pr^-^-lj-^ + ^ 

1 emin(ai,V^ r ) e miii(a^, V^") e 2 min(aia£, t) 
= 4 + + V? + 16t ' 

Therefore, E[(a ■ a')(r)] = (a • a')(r), and the theorem follows by a Hoeffding bound and the guarantees 
forfi. □ 

Notice that Lemma|8]is valid regardless of whether St and S* t ' are interleved in an arbitrary manner. Also, 
we can take St = S' t , and the algorithm gives an estimate for T 2 , i.e. T2(t) = (a • a)(r). 

5.2 Random Oracle and 

Two assumptions that we make in Section 2] are that we have oracle access to a random function / and that 
the value F^ at the time step to when the algorithm is queried is known before the sequence of updates is 
processed. Here we show how these assumptions can be relaxed. 

Notice that, assuming a bound a, < U, our heavy hitters algorithm uses constant space. Therefore Nisan's 
pseudorandom generator [TU] can be used to remove the first assumption. To address the second assumption, 
we can assume an upper bound U Q on F x = F[ to) . Then we can run log Uq + 1 instances of our heavy hitters 
algorithm in parallel with F{ (the projected value of F^ set to 1, 2, 4, ... , Uq, respectively. At query time we 
use the output of the algorithm instance with F[ set to 2 r io s ^1 1 _ This procedure gives us a 2(log Uq + l)e-pan 
private algorithm that outputs an estimate HH(fc) £ [(1 - (3) HH(fc) - O(Vk), HH(0(fc 2 )) + 0(V%)]. 



6 Concluding Remarks 

Inspired by [I], we study pan-private algorithms that guarantee differential privacy of data analyses even 
when the internal memory of the algorithm may be compromised by an unannounced intrusion of an attacker. 
[3] used techniques from random response |13j on top of sampling to get pan-private streaming algorithms 
for some of the basic statistical estimates on the input. 

We addressed fundamental questions about the memory, its size and its role in pan-privacy. We showed 
that distinct count can not be estimated accurately to additive error even given unbounded space; this is 
based on approach for showing lower bounds via noisy decoding. We also showed a streaming algorithm that 
is pan-private and matches this accuracy. We also show worst case O(k) approximate streaming pan-private 
algorithm for estimating heavy-hitter counts. Both of these upper bounds come from using sketches. Also, 
it is interesting that while we do not require pan-private algorithms to use small memory, the best known 
algorithms so far are streaming, that is, they use sublinear memory. 

We find the notion of pan-privacy to be intriguing, and believe more needs to be understood in this 
intersection of differential privacy and streaming. For example, in streaming, many problems can be solved 
in presence negative and positive d/s. While our distince count estimation in this paper works in this case 
and is pan-private, we leave it open to address the difficulty of obtaining pan-private algorithms for other 
problems in such cases. Also, the basic model of pan-privacy here can be extended to the case when there 
are multiple intrusions or even continual intrusions [5]. Under those models, what statistical estimates can 
be computed accurately and privately? 

We conclude this paper with the observation that our insights so far give pan-private approximations for 
related problems such as T 2 (cropped F 2 ) and inner products. We leave it open to extend these results to 
other problems such as entropy estimation. 
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